[ET-VK] Add alignment fields to PackedDimInfo for padded size calculation by pytorchbot · Pull Request #17260 · pytorch/executorch

pytorchbot · 2026-02-05T23:28:41Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #17170 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/405/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/405/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/398/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/405/orig
Differential Revision: D92196649
@diff-train-skip-merge

… per-shader timing Pull Request resolved: #17105 This change improves the benchmark test harness in three ways: 1. **Reference computation caching**: Test cases are now grouped by a `ReferenceKey` that captures the inputs affecting reference output (sizes, dtype, data generation type). Reference computation runs once per group and results are reused, significantly speeding up test suites with many storage/layout variations of the same logical test case. 2. **Per-shader timing breakdown**: Benchmark output now shows individual shader execution times with global and local workgroup sizes, making it easier to identify performance bottlenecks when multiple shaders participate in an operator. 3. **Deferred data generation**: Tensor data is now generated lazily with explicit seeding, enabling deterministic data sharing across grouped test cases. This ensures identical inputs produce identical reference outputs for caching correctness. Also adds string input support (`ValueSpec::make_string()`) and helper functions for concise test case naming (`layout_abbrev`, `repr_str`, `shape_string`). ghstack-source-id: 338638546 @exported-using-ghexport Differential Revision: [D91945038](https://our.internmc.facebook.com/intern/diff/D91945038/)

…tion Pull Request resolved: #17170 This change introduces separate alignment fields to PackedDimInfo, decoupling the alignment used for padding tensor dimensions from the block size used for packing. Previously, `calculate_padded_sizes` used `packed_dim_block_size` and `outer_packed_dim_block_size` directly to determine how much to pad tensor dimensions. This works but limits flexibility - there are scenarios where we want to pad dimensions to a larger alignment than the block size for performance reasons, such as ensuring loads are aligned to cache lines or removing the need for bounds checking in shaders. The new fields `packed_dim_align` and `outer_packed_dim_align` allow specifying the alignment independently. For now, these are initialized to match the corresponding block sizes, preserving existing behavior. Future changes can set larger alignment values when beneficial for performance. Authored with Claude. ghstack-source-id: 338638551 @exported-using-ghexport Differential Revision: [D92196649](https://our.internmc.facebook.com/intern/diff/D92196649/)

pytorch-bot · 2026-02-05T23:28:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17260

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 119 Pending

As of commit 694f9b8 with merge base 1cffd23 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…perators (#17261) Implemented quantize_per_tensor and dequantize_per_tensor GLSL shaders and C++ dispatch logic to support the new single-dimension packed INT8 layouts (kPackedInt8_4W, kPackedInt8_4C, kPackedInt8_4H). These operators enable conversion between floating-point tensors and packed int8 representations with per-tensor scale and zero-point parameters. The implementation includes: - GLSL shaders: quantize_per_tensor and dequantize_per_tensor with support for both texture->buffer and buffer->buffer data flows, including GL_EXT_debug_printf statements for debugging - QuantizeDequantize.cpp: Added dispatch functions for the new layouts and registered etvk.q_dq_8bit_per_tensor.default operator - Test infrastructure: Created q_dq_8bit_per_tensor test binary with DEBUG_MODE support and reference CPU implementation for validation The shaders implement the quantization formula Q = clamp(round(x/scale) + zp, -128, 127) and dequantization formula x' = (Q - zp) * scale, with proper int8 packing/unpacking using little-endian byte ordering and sign extension. Differential Revision: [D92061370](https://our.internmc.facebook.com/intern/diff/D92061370/) [ghstack-poisoned]

github-actions · 2026-02-05T23:57:42Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

ssjia added 2 commits February 5, 2026 10:21

pytorchbot requested a review from SS-JIA as a code owner February 5, 2026 23:28

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2026

Base automatically changed from gh/SS-JIA/398/orig to main February 5, 2026 23:50

SS-JIA requested review from kirklandsign and larryliu0820 as code owners February 5, 2026 23:57

SS-JIA approved these changes Feb 5, 2026

View reviewed changes

SS-JIA merged commit e1b3bd4 into main Feb 6, 2026
122 of 129 checks passed

SS-JIA deleted the gh/SS-JIA/405/orig branch February 6, 2026 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK] Add alignment fields to PackedDimInfo for padded size calculation#17260

[ET-VK] Add alignment fields to PackedDimInfo for padded size calculation#17260
SS-JIA merged 3 commits intomainfrom
gh/SS-JIA/405/orig

pytorchbot commented Feb 5, 2026

Uh oh!

pytorch-bot bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pytorchbot commented Feb 5, 2026

Uh oh!

pytorch-bot bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17260

⏳ No Failures, 119 Pending

Uh oh!

github-actions bot commented Feb 5, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Feb 5, 2026 •

edited

Loading

This PR needs a `release notes:` label